AITopics | yaodong yang

Collaborating Authors

yaodong yang

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Neural Auto-Curricula

Neural Information Processing SystemsApr-24-2026, 23:50:08 GMT

When solving two-player zero-sum games, multi-agent reinforcement learning (MARL) algorithms often create populations of agents where, at each iteration, a new agent is discovered as the best response to a mixture over the opponent population. Within such a process, the update rules of "who to compete with" (i.e., the opponent mixture) and "how to beat them" (i.e., finding best responses) are underpinned by manually developed game theoretical principles such as fictitious play and Double Oracle. In this paper1, we introduce a novel framework--Neural Auto-Curricula (NAC)--that leverages meta-gradient descent to automate the discovery of the learning update rule without explicit human design. Specifically, we parameterise the opponent selection module by neural networks and the bestresponse module by optimisation subroutines, and update their parameters solely via interaction with the game engine, where both players aim to minimise their exploitability. Surprisingly, even without human design, the discovered MARL algorithms achieve competitive or even better performance with the state-of-the-art population-based game solvers (e.g., PSRO) on Games of Skill, differentiable Lotto, non-transitive Mixture Games, Iterated Matching Pennies, and Kuhn Poker. Additionally, we show that NAC is able to generalise from small games to large games, for example training on Kuhn Poker and outperforming PSRO on Leduc Poker. Our work inspires a promising future direction to discover general MARL algorithms solely from data.

arxiv preprint arxiv, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Industry: Leisure & Entertainment > Games > Computer Games (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.35)

Add feedback

Multi-AgentReinforcementLearningis ASequenceModelingProblem

Neural Information Processing SystemsFeb-9-2026, 13:56:00 GMT

Recently, such difficulty in multi-agent learning has been eased owing to the introduction ofcentralized training for decentralized execution(CTDE) [11, 45], which allows agents to access the global information andopponents' actions during thetraining phase.

agent, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Country: Asia > China > Shanghai > Shanghai (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.46)

Add feedback

Settlingthe Varianceof Multi-Agent Policy Gradients

Neural Information Processing SystemsFeb-9-2026, 07:37:00 GMT

InProceedingsofthe First International Conferenceon Distributed Artificial Intelligence, pages 1-7, 2019.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.47)

Add feedback

1cd73be1e256a7405516501e94e892ac-Paper.pdf

Neural Information Processing SystemsFeb-7-2026, 17:46:25 GMT

arxiv preprint arxiv, exploitability, oracle, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Orange County > Irvine (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

b2a1c152f14a4b842a9ddb3bd84c62a1-Paper-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsAug-17-2025, 21:47:07 GMT

artificial intelligence, machine learning, reinforcement learning, (12 more...)

Neural Information Processing Systems

Country:

North America > United States (0.04)
Europe > Sweden > Skåne County > Malmö (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (0.68)

Industry:

Information Technology (1.00)
Leisure & Entertainment > Games > Computer Games (0.46)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
(2 more...)

Add feedback

435cce71b4007699041dfffa4f034079-Paper-Conference.pdf

Neural Information Processing SystemsAug-14-2025, 11:30:53 GMT

diversity, international conference, neural information processing system, (11 more...)

Neural Information Processing Systems

Country:

Asia > China > Guangdong Province > Guangzhou (0.04)
Asia > China > Beijing > Beijing (0.04)
Europe > Czechia > Prague (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Multi-Agent Reinforcement Learning is a Sequence Modeling Problem

Wen, Muning, Kuba, Jakub Grudzien, Lin, Runji, Zhang, Weinan, Wen, Ying, Wang, Jun, Yang, Yaodong

arXiv.org Artificial IntelligenceOct-28-2022

Large sequence model (SM) such as GPT series and BERT has displayed outstanding performance and generalization capabilities on vision, language, and recently reinforcement learning tasks. A natural follow-up question is how to abstract multi-agent decision making into an SM problem and benefit from the prosperous development of SMs. In this paper, we introduce a novel architecture named Multi-Agent Transformer (MAT) that effectively casts cooperative multi-agent reinforcement learning (MARL) into SM problems wherein the task is to map agents' observation sequence to agents' optimal action sequence. Our goal is to build the bridge between MARL and SMs so that the modeling power of modern sequence models can be unleashed for MARL. Central to our MAT is an encoder-decoder architecture which leverages the multi-agent advantage decomposition theorem to transform the joint policy search problem into a sequential decision making process; this renders only linear time complexity for multi-agent problems and, most importantly, endows MAT with monotonic performance improvement guarantee. Unlike prior arts such as Decision Transformer fit only pre-collected offline data, MAT is trained by online trials and errors from the environment in an on-policy fashion. To validate MAT, we conduct extensive experiments on StarCraftII, Multi-Agent MuJoCo, Dexterous Hands Manipulation, and Google Research Football benchmarks. Results demonstrate that MAT achieves superior performance and data efficiency compared to strong baselines including MAPPO and HAPPO. Furthermore, we demonstrate that MAT is an excellent few-short learner on unseen tasks regardless of changes in the number of agents. See our project page at https://sites.google.com/view/multi-agent-transformer.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2205.14953

Country:

Asia > China > Shanghai > Shanghai (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Offline Pre-trained Multi-Agent Decision Transformer: One Big Sequence Model Tackles All SMAC Tasks

Meng, Linghui, Wen, Muning, Yang, Yaodong, Le, Chenyang, Li, Xiyun, Zhang, Weinan, Wen, Ying, Zhang, Haifeng, Wang, Jun, Xu, Bo

arXiv.org Artificial IntelligenceDec-19-2021

Offline reinforcement learning leverages previously-collected offline datasets to learn optimal policies with no necessity to access the real environment. Such a paradigm is also desirable for multi-agent reinforcement learning (MARL) tasks, given the increased interactions among agents and with the enviroment. Yet, in MARL, the paradigm of offline pre-training with online fine-tuning has not been studied, nor datasets or benchmarks for offline MARL research are available. In this paper, we facilitate the research by providing large-scale datasets, and use them to examine the usage of the Decision Transformer in the context of MARL. We investigate the generalisation of MARL offline pre-training in the following three aspects: 1) between single agents and multiple agents, 2) from offline pretraining to the online fine-tuning, and 3) to that of multiple downstream tasks with few-shot and zero-shot capabilities. We start by introducing the first offline MARL dataset with diverse quality levels based on the StarCraftII environment, and then propose the novel architecture of multi-agent decision transformer (MADT) for effective offline learning. MADT leverages transformer's modelling ability of sequence modelling and integrates it seamlessly with both offline and online MARL tasks. A crucial benefit of MADT is that it learns generalizable policies that can transfer between different types of agents under different task scenarios. On StarCraft II offline dataset, MADT outperforms the state-of-the-art offline RL baselines. When applied to online tasks, the pre-trained MADT significantly improves sample efficiency, and enjoys strong performance both few-short and zero-shot cases. To our best knowledge, this is the first work that studies and demonstrates the effectiveness of offline pre-trained models in terms of sample efficiency and generalisability enhancements in MARL.

arxiv preprint arxiv, offline dataset, yaodong yang, (10 more...)

arXiv.org Artificial Intelligence

2112.02845

Country:

Asia > China > Shanghai > Shanghai (0.04)
Africa > Togo (0.04)

Genre: Research Report (1.00)

Industry:

Health & Medicine (0.46)
Leisure & Entertainment > Games (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.84)

Add feedback

Multi-Agent Constrained Policy Optimisation

Gu, Shangding, Kuba, Jakub Grudzien, Wen, Munning, Chen, Ruiqing, Wang, Ziyan, Tian, Zheng, Wang, Jun, Knoll, Alois, Yang, Yaodong

arXiv.org Artificial IntelligenceOct-6-2021

Developing reinforcement learning algorithms that satisfy safety constraints is becoming increasingly important in real-world applications. In multi-agent reinforcement learning (MARL) settings, policy optimisation with safety awareness is particularly challenging because each individual agent has to not only meet its own safety constraints, but also consider those of others so that their joint behaviour can be guaranteed safe. Despite its importance, the problem of safe multi-agent learning has not been rigorously studied; very few solutions have been proposed, nor a sharable testing environment or benchmarks. To fill these gaps, in this work, we formulate the safe MARL problem as a constrained Markov game and solve it with policy optimisation methods. Our solutions -- Multi-Agent Constrained Policy Optimisation (MACPO) and MAPPO-Lagrangian -- leverage the theories from both constrained policy optimisation and multi-agent trust region learning. Crucially, our methods enjoy theoretical guarantees of both monotonic improvement in reward and satisfaction of safety constraints at every iteration. To examine the effectiveness of our methods, we develop the benchmark suite of Safe Multi-Agent MuJoCo that involves a variety of MARL baselines. Experimental results justify that MACPO/MAPPO-Lagrangian can consistently satisfy safety constraints, meanwhile achieving comparable performance to strong baselines.

agent, constraint, safety constraint, (12 more...)

arXiv.org Artificial Intelligence

2110.02793

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Filters

Collaborating Authors

yaodong yang

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Neural Auto-Curricula

Multi-AgentReinforcementLearningis ASequenceModelingProblem

Settlingthe Varianceof Multi-Agent Policy Gradients

1cd73be1e256a7405516501e94e892ac-Paper.pdf

b2a1c152f14a4b842a9ddb3bd84c62a1-Paper-Datasets_and_Benchmarks.pdf

6fe6a8a6e6cb710584efc4af0c34ce50-Paper.pdf

435cce71b4007699041dfffa4f034079-Paper-Conference.pdf

Multi-Agent Reinforcement Learning is a Sequence Modeling Problem

Offline Pre-trained Multi-Agent Decision Transformer: One Big Sequence Model Tackles All SMAC Tasks

Multi-Agent Constrained Policy Optimisation